Goto

Collaborating Authors

 Roanoke



Efficiently Learning Synthetic Control Models for High-dimensional Disaggregated Data

Shen, Ye, Song, Rui, Abadie, Alberto

arXiv.org Machine Learning

The Synthetic Control method (SC) has become a valuable tool for estimating causal effects. Originally designed for single-treated unit scenarios, it has recently found applications in high-dimensional disaggregated settings with multiple treated units. However, challenges in practical implementation and computational efficiency arise in such scenarios. To tackle these challenges, we propose a novel approach that integrates the Multivariate Square-root Lasso method into the synthetic control framework. We rigorously establish the estimation error bounds for fitting the Synthetic Control weights using Multivariate Square-root Lasso, accommodating high-dimensionality and time series dependencies. Additionally, we quantify the estimation error for the Average Treatment Effect on the Treated (ATT). Through simulation studies, we demonstrate that our method offers superior computational efficiency without compromising estimation accuracy. We apply our method to assess the causal impact of COVID-19 Stay-at-Home Orders on the monthly unemployment rate in the United States at the county level.



What is in the model? A Comparison of variable selection criteria and model search approaches

Xu, Shuangshuang, Ferreira, Marco A. R., Tegge, Allison N.

arXiv.org Machine Learning

What is in the model? Abstract For many scientific questions, understanding the underlying mechanism is the goal. To help investigators better understand the underlying mechanism, variable selection is a crucial step that permits the identification of the most associated regression variables of interest. A variable selection method consists of model evaluation using an information criterion and a search of the model space. Here, we provide a comprehensive comparison of variable selection methods using performance measures of correct identification rate (CIR), recall, and false discovery rate (FDR). We consider the BIC and AIC for evaluating models, and exhaustive, greedy, LASSO path, and stochastic search approaches for searching the model space; we also consider LASSO using cross validation. We perform simulation studies for linear and generalized linear models that parametrically explore a wide range of realistic sample sizes, effect sizes, and correlations among regression variables. We consider model spaces with a small and larger number of potential regressors. The results show that the exhaustive search BIC and stochastic search BIC outperform the other methods when considering the performance measures on small and large model spaces, respectively. These approaches result in the highest CIR and lowest FDR, which collectively may support long-term efforts towards increasing replicability in research.


FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Zeng, Zhiyuan, Liu, Jiashuo, Chen, Siyuan, He, Tianci, Liao, Yali, Tian, Yixiao, Wang, Jinpeng, Wang, Zaiyuan, Yang, Yang, Yin, Lingyue, Yin, Mingren, Zhu, Zhenwei, Cai, Tianle, Chen, Zehui, Chen, Jiecao, Du, Yantao, Gao, Xiang, Guo, Jiacheng, Hu, Liang, Jiao, Jianpeng, Li, Xiangsheng, Liu, Jingkai, Ni, Shuang, Wen, Zhoufutu, Zhang, Ge, Zhang, Kaiyuan, Zhou, Xin, Blanchet, Jose, Qiu, Xipeng, Wang, Mengdi, Huang, Wenhao

arXiv.org Artificial Intelligence

Future prediction is a complex task for LLM agents, requiring a high level of analytical thinking, information gathering, contextual understanding, and decision-making under uncertainty. Agents must not only gather and interpret vast amounts of dynamic information but also integrate diverse data sources, weigh uncertainties, and adapt predictions based on emerging trends, just as human experts do in fields like politics, economics, and finance. Despite its importance, no large-scale benchmark exists for evaluating agents on future prediction, largely due to challenges in handling real-time updates and retrieving timely, accurate answers. To address this, we introduce $\textbf{FutureX}$, a dynamic and live evaluation benchmark specifically designed for LLM agents performing future prediction tasks. FutureX is the largest and most diverse live benchmark for future prediction, supporting real-time daily updates and eliminating data contamination through an automated pipeline for question gathering and answer collection. We evaluate 25 LLM/agent models, including those with reasoning, search capabilities, and integration of external tools such as the open-source Deep Research Agent and closed-source Deep Research models. This comprehensive evaluation assesses agents' adaptive reasoning and performance in dynamic environments. Additionally, we provide in-depth analyses of agents' failure modes and performance pitfalls in future-oriented tasks, including the vulnerability to fake web pages and the temporal validity. Our goal is to establish a dynamic, contamination-free evaluation standard that drives the development of LLM agents capable of performing at the level of professional human analysts in complex reasoning and predictive thinking.


Children's Mental Models of AI Reasoning: Implications for AI Literacy Education

Dangol, Aayushi, Wolfe, Robert, Zhao, Runhua, Kim, JaeWon, Ramanan, Trushaa, Davis, Katie, Kientz, Julie A.

arXiv.org Artificial Intelligence

As artificial intelligence (AI) advances in reasoning capabilities, most recently with the emergence of Large Reasoning Models (LRMs), understanding how children conceptualize AI's reasoning processes becomes critical for fostering AI literacy. While one of the "Five Big Ideas" in AI education highlights reasoning algorithms as central to AI decision-making, less is known about children's mental models in this area. Through a two-phase approach, consisting of a co-design session with 8 children followed by a field study with 106 children (grades 3-8), we identified three models of AI reasoning: Deductive, Inductive, and Inherent. Our findings reveal that younger children (grades 3-5) often attribute AI's reasoning to inherent intelligence, while older children (grades 6-8) recognize AI as a pattern recognizer. We highlight three tensions that surfaced in children's understanding of AI reasoning and conclude with implications for scaffolding AI curricula and designing explainable AI tools.


ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights

Sarch, Gabriel, Jang, Lawrence, Tarr, Michael J., Cohen, William W., Marino, Kenneth, Fragkiadaki, Katerina

arXiv.org Artificial Intelligence

Large-scale generative language and vision-language models (LLMs and VLMs) excel in few-shot in-context learning for decision making and instruction following. However, they require high-quality exemplar demonstrations to be included in their context window. In this work, we ask: Can LLMs and VLMs generate their own prompt examples from generic, sub-optimal demonstrations? We propose In-Context Abstraction Learning (ICAL), a method that builds a memory of multimodal experience insights from sub-optimal demonstrations and human feedback. Given a noisy demonstration in a new domain, VLMs abstract the trajectory into a general program by fixing inefficient actions and annotating cognitive abstractions: task relationships, object state changes, temporal subgoals, and task construals. These abstractions are refined and adapted interactively through human feedback while the agent attempts to execute the trajectory in a similar environment. The resulting abstractions, when used as exemplars in the prompt, significantly improve decision-making in retrieval-augmented LLM and VLM agents. Our ICAL agent surpasses the state-of-the-art in dialogue-based instruction following in TEACh, multimodal web agents in VisualWebArena, and action anticipation in Ego4D. In TEACh, we achieve a 12.6% improvement in goal-condition success. In VisualWebArena, our task success rate improves over the SOTA from 14.3% to 22.7%. In Ego4D action forecasting, we improve over few-shot GPT-4V and remain competitive with supervised models. We show finetuning our retrieval-augmented in-context agent yields additional improvements. Our approach significantly reduces reliance on expert-crafted examples and consistently outperforms in-context learning from action plans that lack such insights.


Tech claiming to protect U.S. schools from mass shootings prompts growing unease

The Japan Times

LOS ANGELES – A few days after a gunman killed 19 children and two teachers in Uvalde, Texas, a year ago, taser-maker Axon Enterprise floated the idea of a "nonlethal" drone for schools that could be activated by AI-powered surveillance. It caused a stir -- prompting the company's own AI ethics advisory board to quit in protest and highlighting growing unease about the ethics and effectiveness of security tools being marketed aggressively by technology firms to U.S. schools. "I had to have my secretary screen out the calls from all these companies," said Rita Bishop, former superintendent of the school system in the city of Roanoke, Virginia, recalling sales pitches for everything from drones to AI-powered surveillance cameras and weapons detectors. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites.


Deep Learning Architectures for FSCV, a Comparison

Twomey, Thomas, Barbosa, Leonardo, Lohrenz, Terry, Montague, P. Read

arXiv.org Artificial Intelligence

We examined multiple deep neural network (DNN) architectures for suitability in predicting neurotransmitter concentrations from labeled in vitro fast scan cyclic voltammetry (FSCV) data collected on carbon fiber electrodes. Suitability is determined by the predictive performance in the "out-of-probe" case, the response to artificially induced electrical noise, and the ability to predict when the model will be errant for a given probe. This work extends prior comparisons of time series classification models by focusing on this specific task. It extends previous applications of machine learning to FSCV task by using a much larger data set and by incorporating recent advancements in deep neural networks. The InceptionTime architecture, a deep convolutional neural network, has the best absolute predictive performance of the models tested but was more susceptible to noise. A naive multilayer perceptron architecture had the second lowest prediction error and was less affected by the artificial noise, suggesting that convolutions may not be as important for this task as one might suspect.


Waiting For Self-Deriving Cars

#artificialintelligence

Once you trust a self-driving car with your life, you pretty much will trust Artificial Intelligence with anything--Dave Waters. Keith Kirkpatrick is the author of an interesting CACM article on self-driving cars. It is titled "Still Waiting For Self-Driving Cars" and appears in the news section of this month's issue. Today we discuss why it has been so difficult to get self-driving cars started. Over the past decade, technology and automotive pundits have predicted the "imminent" arrival of fully autonomous vehicles that can drive on public roads without any active monitoring or input from a human driver.